176        Bioinformatics

where Ygj is the observed read count of the gene g of interest in the sample j, K j is the size

of library of sample j (total number of aligned reads), and Ygr is the observed reads count of

the same gene g in the reference sample r of library size Kr.

Then, the value of the gene expression fold change, Mg, is trimmed by 30% followed

by taking the weighted average for the trimmed Mg using inverse of the variances of read

counts of genes (Vgi) as weight since the log-fold changes from gene with larger read counts

will have lower variance on the logarithm scale. The TMM adjustment, f j, is given as

f

V M

V

j

i

n

g

g

i

n

g

i

i

i

=

(5.5)

The adjustment f j is an estimate for relative RNA production of two samples. The TMM

normalization factor for the sample j with m genes is given by

N

f

Y

j

j

g

m

gj

=

=

 

1

(5.6)

TMM does not correct the observed read counts for the gene length, and hence, it is not

suitable for comparison between the gene expressions in the same sample.

5.3.5.5  Relative Expression

For a given sample j, the relative expression (RE) scaling factors are calculated as the

median of the ratios of observed counts to the geometric mean across all samples (pseudo-

reference sample, r) [28]. The scaling factors are calculated as follows:

N

g

y

Y

j

gj

r

n

gr

n

=

=

median

1

1/

(5.7)

5.3.5.6  Upper Quartile

The upper quartile (UQ) normalization factor is computed as the sample upper quartile

(75th percentile) of gene counts for the genes with no zero counts for all samples [29].

5.3.6  Differential Expression Analysis

Most differential expression programs have functions that normalize gene expression

count data as part of the analysis. The design of the study is crucial in the differential

expression analysis as we discussed in the introduction of this chapter. The conditions

that group the samples and sample replicates must be determined as metadata before the

analysis. A simple gene expression study is made up of two groups (e.g., treated and control